Linguistic and Statistical Approaches to Basque Term Extraction

نویسندگان

  • A. Gurrutxaga
  • P. Lizaso
  • X. Saralegi
  • S. Ugartetxea
  • R. Urizar
چکیده

The development of applications for terminology extraction in Basque demands previous research on linguistic techniques, in order to fulfil the requirements of Basque language processing. Being Basque an agglutinative language, the results of pure statistical methods are not satisfactory and suitable for term extraction. In this work, we have adopted a hybrid approach, based on the selection of term candidates by means of language techniques and the subsequent application of statistical association measures. In this work, we will focus mainly on linguistic technique design, and we will overview the first experimental results. This work is part of Erauzterm, a project for the development of a term extraction tool for Basque. The tool is in its first stage of development, and future improvements are close. Erauzterm is the first attempt to develop such a tool for Basque.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A XML-Based Term Extraction Tool for Basque

This project combines linguistic and statistical information to develop a term extraction tool for Basque. Being Basque an agglutinative and highly inflected language, the treatment of morphosyntactic information is vital. In addition, due to late unification process of the language, texts present more elevated term dispersion than in a highly normalized language. The result is a semiautomatic ...

متن کامل

ELexBI, A BASIC TOOL FOR BILINGUAL TERM EXTRACTION FROM SPANISH-BASQUE PARALLEL CORPORA

We present the work done by Elhuyar Foundation in the field of bilingual terminology extraction. The aim of this work is to develop some techniques for the automatic extraction of pairs of equivalent terms from Spanish-Basque translation memories, and to implement those techniques in a prototype. Our approach is based on a previous monolingual extraction of term candidates in each language, the...

متن کامل

Computational Lexicography and Lexicology Elexbi, a Basic Tool for Bilingual Term Extraction from Spanish-Basque Parallel Corpora

We present the work done by Elhuyar Foundation in the field of bilingual terminology extraction. The aim ofthis work is to develop some techniques for the automatic extraction ofpairs ofequivalent terms from Spanish-Basque translation memories, and to implement those techniques in a prototype. Our approach is based on a monolingual extraction of term candidates in each language, then the creati...

متن کامل

A Study of Association Measures and their Combination for Arabic MWT Extraction

Automatic Multi-Word Term (MWT) extraction is a very important issue to many applications, such as information retrieval, question answering, and text categorization. Although many methods have been used for MWT extraction in English and other European languages, few studies have been applied to Arabic. In this paper, we propose a novel, hybrid method which combines linguistic and statistical a...

متن کامل

Improving End-User Efficiency Using the Smart/Empire IR System

We attack each task through a combination of statistical and linguistic approaches. The proposed statistical approaches extend existing methods in IR by performing computations within the context of another query or document. The proposed linguistic approaches build on existing work in information extraction and rely on a new technique for trainable partial parsing. In short, our integrated app...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004